Another common transformation of time-series is to apply a function over a fixed rolling window of data.
Note that rolling functions different conceptually from aggregates as they are not calculated over disjoint subsets of the data: the output is at the same time period as the original data.
Moving Averages
A common rolling function is the moving average: we calculate the average value of the time series over a fixed window of data.
ap_rollmean_sixmonth_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value,
mutate_fun = rollapply,
# rollapply args
width = 6,
align = "right",
FUN = mean,
# mean args
na.rm = TRUE,
# tq_mutate args
col_rename = "mean_6m"
)
ap_rollmean_sixmonth_tbl %>% glimpse()
Rows: 144
Columns: 3
$ month <yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun 194…
$ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 1…
$ mean_6m <dbl> NA, NA, NA, NA, NA, 124.5000, 130.5000, 135.5000, 136.1667, 1…
#ap_rollmean_sixmonth_tbl %>% summary()
We compare the two values by plotting the original time series against its moving average.
plot_tbl <- ap_rollmean_sixmonth_tbl %>%
rename(orig = value) %>%
gather('label', 'value', -month)
ggplot(plot_tbl) +
geom_line(aes(x = month, y = value, colour = label)) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Comparison Plot of the Air Passenger Counts')

Note that the moving-average series does not start at the same timestamp as the original dataset size is reduced by the windowing function.
We can add multiple moving averages to a time series by chaining a series of tq_mutate() calls together.
ap_rollmean_multi_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value,
mutate_fun = rollapply,
# rollapply args
width = 6,
align = "right",
FUN = mean,
# mean args
na.rm = TRUE,
# tq_mutate args
col_rename = "mean_6m"
) %>%
tq_mutate(
# tq_mutate args
select = value,
mutate_fun = rollapply,
# rollapply args
width = 12,
align = "right",
FUN = mean,
# mean args
na.rm = TRUE,
# tq_mutate args
col_rename = "mean_12m"
)
ap_rollmean_multi_tbl %>% glimpse()
Rows: 144
Columns: 4
$ month <yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun 19…
$ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, …
$ mean_6m <dbl> NA, NA, NA, NA, NA, 124.5000, 130.5000, 135.5000, 136.1667, …
$ mean_12m <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 126.6667, 126.91…
ap_rollmean_multi_tbl %>% summary()
month value mean_6m mean_12m
Min. :1949 Min. :104.0 Min. :119.7 Min. :126.7
1st Qu.:1952 1st Qu.:180.0 1st Qu.:182.4 1st Qu.:190.1
Median :1955 Median :265.5 Median :259.2 Median :259.2
Mean :1955 Mean :280.3 Mean :280.2 Mean :278.2
3rd Qu.:1958 3rd Qu.:360.5 3rd Qu.:362.4 3rd Qu.:372.4
Max. :1961 Max. :622.0 Max. :534.0 Max. :476.2
NA's :5 NA's :11
As before, we now create a lineplot of the three values to show the effect of the different window sizes.
plot_tbl <- ap_rollmean_multi_tbl %>%
rename(orig = value) %>%
gather('label', 'value', -month)
ggplot(plot_tbl) +
geom_line(aes(x = month, y = value, colour = label)) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Comparison Plot of the Air Passenger Counts')

The twelve month time series is shorter than the six month series as it has a wider calculation window.
Any sort of other windowing functions can be applied, including the standard deviation, allowing us to include a range of possible values.
ribbon_func <- function(x, na.rm = TRUE, ...) {
mu <- mean(x, na.rm = na.rm)
sigma <- sd(x, na.rm = na.rm)
lower <- mu - 2 * sigma
upper <- mu + 2 * sigma
return(c(mu = mu, l2sd = lower, u2sd = upper))
}
ap_roll_ribbon_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value,
mutate_fun = rollapply,
# rollapply args
width = 6,
align = "right",
by.column = FALSE,
FUN = ribbon_func,
# mean args
na.rm = TRUE
)
ap_roll_ribbon_tbl %>% glimpse()
Rows: 144
Columns: 5
$ month <yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun 1949,…
$ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115…
$ mu <dbl> NA, NA, NA, NA, NA, 124.5000, 130.5000, 135.5000, 136.1667, 134…
$ l2sd <dbl> NA, NA, NA, NA, NA, 106.66745, 109.00581, 114.00581, 114.94718,…
$ u2sd <dbl> NA, NA, NA, NA, NA, 142.3326, 151.9942, 156.9942, 157.3862, 159…
ap_roll_ribbon_tbl %>% summary()
month value mu l2sd
Min. :1949 Min. :104.0 Min. :119.7 Min. : 91.59
1st Qu.:1952 1st Qu.:180.0 1st Qu.:182.4 1st Qu.:153.06
Median :1955 Median :265.5 Median :259.2 Median :203.94
Mean :1955 Mean :280.3 Mean :280.2 Mean :211.32
3rd Qu.:1958 3rd Qu.:360.5 3rd Qu.:362.4 3rd Qu.:275.87
Max. :1961 Max. :622.0 Max. :534.0 Max. :399.01
NA's :5 NA's :5
u2sd
Min. :141.2
1st Qu.:218.0
Median :323.7
Mean :349.0
3rd Qu.:449.8
Max. :695.9
NA's :5
We now plot the original data against the moving average and the mean.
ggplot(ap_roll_ribbon_tbl) +
geom_line(aes(x = month, y = value)) +
geom_line(aes(x = month, y = mu), colour = 'red') +
geom_ribbon(aes(x = month, ymin = l2sd, ymax = u2sd),
colour = 'grey', alpha = 0.25) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Ribbon Plot of the Air Passenger Counts (6 month window)')

We now repeat this process with using a twelve-month window for the data.
ap_roll_12m_ribbon_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value,
mutate_fun = rollapply,
# rollapply args
width = 12,
align = "right",
by.column = FALSE,
FUN = ribbon_func,
# mean args
na.rm = TRUE
)
ap_roll_12m_ribbon_tbl %>% glimpse()
Rows: 144
Columns: 5
$ month <yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun 1949,…
$ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115…
$ mu <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 126.6667, 126.9167,…
$ l2sd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 99.22637, 100.00998…
$ u2sd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 154.1070, 153.8234,…
ap_roll_12m_ribbon_tbl %>% summary()
month value mu l2sd
Min. :1949 Min. :104.0 Min. :126.7 Min. : 91.98
1st Qu.:1952 1st Qu.:180.0 1st Qu.:190.1 1st Qu.:145.35
Median :1955 Median :265.5 Median :259.2 Median :188.39
Mean :1955 Mean :280.3 Mean :278.2 Mean :195.91
3rd Qu.:1958 3rd Qu.:360.5 3rd Qu.:372.4 3rd Qu.:251.73
Max. :1961 Max. :622.0 Max. :476.2 Max. :327.63
NA's :11 NA's :11
u2sd
Min. :153.8
1st Qu.:242.0
Median :326.2
Mean :360.4
3rd Qu.:484.9
Max. :636.7
NA's :11
Having constructed the data, we once again create a ribbon plot with these quantities.
ggplot(ap_roll_12m_ribbon_tbl) +
geom_line(aes(x = month, y = value)) +
geom_line(aes(x = month, y = mu), colour = 'red') +
geom_ribbon(aes(x = month, ymin = l2sd, ymax = u2sd),
colour = 'grey', alpha = 0.25) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Ribbon Plot of the Air Passenger Counts (12 month window)')

Exercises
- Construct a 3 month moving average for the passenger data and compare it to the 6 and 12 month values.
- Calculate the 6 month and 12 month rolling average values for the Maine unemployment data.
- Construct the ribbon plot for the Maine unemployment data.
- Construct moving average data for the CBE dataset. This process may be made easier by reshaping the data.
Differences
Another common transformation of the data is to take the ‘first differences’ of the values, i.e. we convert the time series of values into one of differences. We discuss the reasons for this later on – for now we focus on the mechanics of creating first differences.
ap_firstdiff_tbl <- airpassengers_tbl %>%
mutate(diff = value - lag(value, n = 1))
ap_firstdiff_tbl %>% glimpse()
Rows: 144
Columns: 3
$ month <yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun 1949,…
$ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115…
$ diff <dbl> NA, 6, 14, -3, -8, 14, 13, 0, -12, -17, -15, 14, -3, 11, 15, -6…
ap_firstdiff_tbl %>% summary()
month value diff
Min. :1949 Min. :104.0 Min. :-101.000
1st Qu.:1952 1st Qu.:180.0 1st Qu.: -16.000
Median :1955 Median :265.5 Median : 4.000
Mean :1955 Mean :280.3 Mean : 2.238
3rd Qu.:1958 3rd Qu.:360.5 3rd Qu.: 22.500
Max. :1961 Max. :622.0 Max. : 87.000
NA's :1
Having calculated the differences, we now produce a lineplot of those values.
plot_tbl <- ap_firstdiff_tbl %>%
rename(count = value) %>%
gather('series', 'value', -month)
ggplot(plot_tbl) +
geom_line(aes(x = month, y = value, colour = series)) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Value') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Plot of the Air Passenger Counts and First Differences')

As we see with this plot, the first differences of the passenger data does not contain a trend.
Exercises
- Calculate the first differences for the Maine unemployment data.
- Create a lineplot of this data to check for its value.
- Calculate the first differences for the CBE data.
- Create lineplots for the CBE differences.
- Using the
lag() function with the Air Passenger data, calculate the percentage changes data instead of the arithmetic changes.
- Construct the lineplot for the percentage change values.